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Abstract 

Writing concurrent programs is notoriously difficult, and is of in- 
creasing practical importance. A particular source of concern is 
that even correctly-implemented concurrency abstractions cannot 
be composed together to form larger abstractions. In this paper we 
present a new concurrency model, based on transactional memory , 
that offers far richer composition. All the usual benefits of trans- 
actional memory are present (e.g. freedom from deadlock), but in 
addition we describe new modular forms of blocking and choice 
that have been inaccessible in earlier work. 

Categories and Subject Descriptors D.1.3 [Programming Tech- 
niques]'. Concurrent Programming - Parallel programming; D.4.1 
[Operating Systems ]: Process Management - Concurrency; Syn- 
chronization; Threads 

General Terms Algorithms, Languages 

Keywords Non-blocking algorithms, locks, transactional memory 

1. Introduction 

Concurrent programming is notoriously tricky. Current lock-based 
abstractions are difficult to use and make it hard to design computer 
systems that are reliable and scalable. Furthermore, systems built 
using locks are difficult to compose without knowing about their 
internals. 

To address some of these difficulties, several researchers (in- 
cluding ourselves) have proposed software transactional memory 
(STM), which can perform groups of memory operations atomi- 
cally [27]. Using transactional memory (implemented by optimistic 
synchronisation) instead of locks brings well-known advantages: 
freedom from deadlock and priority inversion, automatic roll-back 
on exceptions or timeouts, and freedom front the tension between 
lock granularity and concurrency. 

Although promising, our previous work on transactional mem- 
ory suffered a number of shortcomings: it could not statically pre- 
vent threads from bypassing transactional interfaces and it did not 
provide a convincing story for operations that may block. In this 
paper we resolve these shortcomings. In particular, we make the 
following contributions: 
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• We re-express the ideas of transactional memory in the setting 
of Concurrent Haskell (Section 3). This is much more than a 
routine “port” into a new setting. As we show, STM can be ex- 
pressed particularly elegantly in a declarative language, and we 
are able to use Haskell’s type system to give far stronger guaran- 
tees than are conventionally possible. Furthermore transactions 
are compositional: small transactions can be glued together to 
form larger transactions. 

• We present a new, modular form of blocking, which appears 
to the programmer as a simple function called retry (Sec- 
tion 3.2). Unlike most existing approaches, the programmer 
does not have to identify the condition under which the transac- 
tion can run to completion: retry can occur anywhere within 
the transaction, blocking it until an alternative execution path 
becomes possible. 

• The retry function allows possibly-blocking transactions to be 
composed in sequence. Beyond this, we also provide orElse, 
which allows them to be composed as alternatives, so that 
the second is run if the first retries (Section 3.4). This abil- 
ity allows threads to wait for many things at once, like the 
Unix select system call - except that orElse composes well, 
whereas select does not. It turns out that orElse requires 
the underlying STM implementation to support genuine nested 
transactions, the first STM to do so (Section 6.4). 

• Unusually for a practical programming language, we provide 
a formal operational semantics of our system in Section 5. 
This semantics clarifies the behaviour in cases which have a 
less intuitive meaning, such as what happens if an exception is 
raised mid-way through a memory transaction. 

• We have implemented our design in the Glasgow Haskell 
Compiler, a fully-fledged optimising compiler for Concurrent 
Haskell. The changes are localised, rather than pervasive, and 
we describe the details in Section 6. 

Taken together, these ideas offer a qualitative improvement in lan- 
guage support for modular concurrency, similar to the improvement 
in moving from assembly code to a high-level language. Our main 
war-cry is compositionality: a programmer can control atomicity 
and blocking behaviour in a modular way that respects abstraction 
barriers. In contrast, current lock-based approaches lead to a direct 
conflict between abstraction and concurrency (Section 2). 

2. Background 

Throughout this paper we study internal concurrency between the 
threads interacting through memory in a single process; we do not 
consider here the questions of external interaction through storage 
systems or databases, nor do we address distributed systems. 

Even in this setting, concurrent programming is extremely dif- 
ficult. The dominant programming technique is based on locks, an 
approach that is simple and direct, but that simply does not scale 
with program size and complexity. To ensure correctness, program- 



mers must identify which operations conflict; to ensure liveness, 
they must avoid introducing deadlock; to ensure good performance, 
they must balance the granularity at which locking is performed 
against the costs of fine-grain locking. Perhaps the most fundamen- 
tal objection, though, is that lock-based programs do not compose: 
correct fragments may fail when combined. 

For example, consider a hash table with thread-safe insert and 
delete operations. Now suppose that we want to delete one item A 
from table tl, and insert it into table t2; but the intermediate state 
(in which neither table contains the item) must not be visible to 
other threads. Unless the implementor of the hash table anticipates 
this need, there is simply no way to satisfy this requirement. Even if 
she does, all she can do is expose methods such as LockTable and 
Uni ockT able - but as well as breaking the hash-table abstraction, 
they invite lock-induced deadlock, depending on the order in which 
the client takes the locks, or race conditions if the client forgets. 
Yet more complexity is required if the client wants to await the 
presence of A in tl, but this blocking behaviour must not lock 
the table (else A cannot be inserted). In short, operations that are 
individually correct (insert, delete) cannot be composed into larger 
correct operations. 

The same phenomenon shows up trying to compose alternative 
blocking operations. Suppose a procedure pi waits for one of two 
input pipes to have data, using an internal call to the Unix select 
procedure; and suppose another procedure p2 does the same thing, 
on two different pipes. In Unix there is no way to perform a select 
between pi and p2, a fundamental loss of compositionality. In- 
stead, Unix programmers learn awkward programming techniques 
to gather up all the file descriptors that must be waited for, perform 
a single top-level select, and then dispatch back to the correct 
handler. Again, two individually-correct abstractions, pi and p2, 
cannot be composed into a larger one; instead, they must be ripped 
apart and awkwardly merged, in direct conflict with the goals of 
abstraction. 

Rather than fixing locks, a more promising and radical alterna- 
tive is to base concurrency control on atomic memory transactions, 
also known as transactional memory. We will show that transac- 
tional memory offers a solution to the tension between concurrency 
and abstraction. For example, with memory transactions we can 
manipulate the hash table thus: 

atomic { v: =delete (tl , A) ; insert(t2, A, v) } 
and to wait for either pi or p2 we can say 

atomic { pi ‘orElse‘ p2 } 

These simple constructions require no knowledge of the implemen- 
tation of insert, delete, pi, or p2, and they continue to work 
correctly if these operations may block, as we shall see. 

2.1 Transactional memory 

The idea of transactions is not new: they have been a fundamental 
mechanism in database design for many years, and there has been 
much recent work on transactional memories [11, 10, 9, 6, 31]. 

The key idea is that a block of code, including nested calls, can 
be enclosed by an atomic block, with the guarantee that it runs 
atomically with respect to every other atomic block. Transactional 
memory can be implemented using optimistic synchronisation. In- 
stead of taking locks, an atomic block runs without locking, accu- 
mulating a thread-local transaction log that records every memory 
read and write it makes. When the block completes, it first validates 
its log, to check that it has seen a consistent view of memory, and 
then commits its changes to memory. If validation fails, because 
memory read by the method was altered by another thread during 
the block’s execution, then the block is re-executed from scratch. 

Transactional memory eliminates, by construction, many of 
the low-level difficulties that plague lock-based programming [6]. 


There are no lock-induced deadlocks (because there are no locks); 
there is no priority inversion; and there is no painful tension be- 
tween granularity and concurrency. However little progress has 
been made on building transactional abstractions that compose 
well. We identify three particular problems. 

Firstly, since a transaction may be re-run automatically, it is 
essential that it do nothing irrevocable. For example the transaction 

atomic { if (n>k) then launch_missiles () ; S2 } 

might launch a second salvo of missiles if it were re-executed. 
It might also launch the missiles inadvertently if. say, the thread 
was de-scheduled after reading n but before reading k, and another 
thread modified both before the thread was resumed. This problem 
begs for a guarantee that the body of the atomic block can only 
perform memory operations, and hence can only make benign 
modifications to its own transaction log, rather than performing 
irrevocable input/output. 

Secondly, blocking is not composable. Many systems do not 
support synchronisation at all without using condition variables, 
and those that do rely on a a programmer-supplied boolean guard 
on the atomic block [9]. For example, a method to get an item 
from a buffer might be: 

Item getO { 

atomic (n_items > 0) {...remove item...} 

} 

The thread waits until the guard (n.items > 0) holds, before ex- 
ecuting the block. But how could we take two consecutive items? 
We cannot call get () ; get () , because another thread might per- 
form an intervening get. We could try wrapping two calls to get 
in a nested atomic block, but the semantics of this are unclear un- 
less the outer block checks there are two items in the buffer. This 
is a disaster for abstraction, because the client (who wants to get 
the two items) has to know about the internal details of the imple- 
mentation. If several separate abstractions are involved, matters are 
even worse. 

Thirdly, no previous transactional memory supports choice, ex- 
emplified by the select example mentioned earlier (but see Sec- 
tion 7.2 on Concurrent ML, which does). We tackle all three issues 
by presenting transactional memory in the context of the declara- 
tive language Concurrent Haskell, which we briefly review next. 

2.2 Concurrent Haskell 

Concurrent Haskell [22] is an extension to Haskell 98, a pure, 
lazy, functional language. It provides explicitly-forked threads, and 
abstractions for communicating between them. This naturally in- 
volves side effects and so, given the lazy evaluation strategy, it is 
necessary to be able to control exactly when they occur. The big 
breakthrough came from a mechanism called monads [23]. 

Here is the key idea: a value of type 10 a is an I/O action that, 
when performed, may do some I/O before yielding a value of type 
a. For example, the functions putChar and getChar have types: 

putChar : : Char -> 10 () 
getChar : : 10 Char 

That is, putChar takes a Char and delivers an I/O action that, 
when performed, prints the character on the standard output; while 
getChar is an action that, when performed, reads a character from 
the console and delivers it as the result of the action. A complete 
program must define an I/O action called main; executing the 
program means performing that action. For example: 

main : : 10 () 
main = putChar ’x' 

I/O actions can be glued together by a monadic bind combinator. 
This is normally used through some syntactic sugar, allowing a C- 



— The STM monad itself 
data STM a 

instance Monad STM 

— Monads support "do" notation and sequencing 

— Exceptions 

throw : : Exception -> STM a 

catch : : STM a -> (Exception->STM a) -> STM a 

— Running STM computations 
atomic : : STM a -> 10 a 
retry : : STM a 

orElse : : STM a -> STM a -> STM a 

— Transactional variables 
data TVar a 

newTVar : : a -> STM (TVar a) 
readTVar : : TVar a -> STM a 
writeTVar : : TVar a -> a -> STM () 


Figure 1: The STM interface 

like syntax. Here, for example, is a complete program that reads a 
character and then prints it twice: 

main = do { c <- getChar; putChar c; putChar c } 

As well as performing external input/output, I/O actions include 
operations with side effects on mutable cells. A value of type 
IORef a is a mutable storage cell which can hold values of type 
a, and is manipulated (only) through the following interface: 

newIORef : : a -> 10 (IORef a) 
readlORef : : IORef a -> 10 a 
writelORef : : IORef a -> a -> 10 0 

newIORef takes a value of type a and creates a mutable storage lo- 
cation holding that value. readlORef takes a reference to such a lo- 
cation and returns the value that it contains. writelORef provides 
the corresponding update operation. Since these cells can only be 
created, read, and written using operations in the 10 monad, there 
is a type-secure guarantee that ordinary functions are unaffected 
by state - e.g. a pure function sin cannot read or write an IORef 
because sin has type Float -> Float. 

Concurrent Haskell supports threads, each independently per- 
forming input/output. Threads are created using a function forklO. 

forklO : : 10 a -> 10 Threadld 

f orklO takes an I/O action as its argument, spawns a fresh thread 
to perform that action, and immediately returns its thread identifier 
to the caller. For example, here is a program that forks a thread that 
prints ‘x’, while the main thread goes on to print ‘y’: 

main = do { forklO (print ’x’); print ’y’ } 

A fuller introduction to concurrency, I/O, exceptions and cross- 
language interfacing (the “awkward squad” for pure, lazy, func- 
tional programming) is given in [21], Several general on-line tuto- 
rials on Haskell are also available, for instance [3], 

3. Composable transactions 

We are now ready to present the key ideas of the paper. Our 
starting point is this: a purely-declarative language is a peifect 
setting for transactional memory, for two reasons. First, the type 
system explicitly separates computations which may have side- 
effects from effect-free ones. As we shall see, it is easy to refine it 
so that transactions can perform memory effects but not irrevocable 
input/output effects. Second, reads from and writes to mutable cells 
are explicit, and relatively rare: most computation takes place in the 
purely functional world. These functional computations perform 


many, many memory operations — allocation, update of thunks, 
stack operations, and so on — but none of these need to be tracked 
by the STM, because they are pure, and never need to be rolled 
back. Only the relatively-rare explicit operations need be logged, 
so a software implementation is entirely appropriate. 

So our approach is to use Haskell as a kind of “laboratory” in 
which to study the ideas of transactional memory in a setting with a 
very expressive type system. As we shall see, we are able to define 
a much more compositional form of transactional memory than has 
been possible hitherto. As we go, we will mention primitives from 
the STM library, whose interface is summarised in Figure 1, and 
whose semantics we will describe more thoroughly in Section 5. 

3.1 Transactional variables and atomicity 

Suppose we wish to implement a resource manager, which holds 
an integer-valued resource. The call getR r n should acquire n 
units of resource r, blocking if r holds insufficient resource; the 
call putR r n should return n units of resource to r. 

Here is how we might program putR in STM Haskell: 

type Resource = TVar Int 

putR : : Resource -> Int -> STM () 

putR r i = do I v <- readTVar r 

; writeTVar r (v+i) } 

The currently-available resource is held in a transactional variable 
of type TVar Int. The type declaration simply gives a name to 
this type. The function putR reads the value v of the resource from 
its cell, and writes back (v+i) into the same cell. (We discuss getR 
next, in Section 3.2.) 

The readTVar and writeTVar operations both return STM 
actions (Figure 1), but Haskell allows us to use the same do { . . . } 
syntax to compose STM actions as we did for I/O actions. These 
STM actions remain tentative during their execution: in order to 
expose an STM action to the rest of the system, it can be passed to 
a new function atomic, with type 

atomic : : STM a -> 10 a 

It takes a memory transaction, of type STM a, and delivers an I/O 
action that, when performed, runs the transaction atomically with 
respect to all other memory transactions. One might say: 

main = do { . . . ; atomic (putR r 3) ; ... } 

The atomic function and all of the STM-typed operations are built 
over the transactional memory described in Section 6. This deals 
with maintaining a per-thread transaction log to record the tentative 
accesses made to TVars. When atomic is invoked the STM checks 
that the logged accesses are valid - i.e. no concurrent transaction 
has committed conflicting updates. If the log is valid then the STM 
commits it atomically to the heap, thereby exposing its effects to 
other transactions. Otherwise the memory transaction is re-run with 
a fresh log. 

Splitting the world into STM actions and I/O actions provides 
two valuable guarantees: 

• Only STM actions and pure computation can be performed in- 
side a memory transaction; in particular I/O actions cannot. 
This is precisely the guarantee we sought in Section 2.1. It stat- 
ically prevents the programmer from calling launchMissiles 
inside a transaction, because launching missiles is an I/O action 
with type 10 () , and cannot be composed with STM actions. 

• No STM actions can be performed outside a transaction, so 
the programmer cannot accidentally read or write a TVar with- 
out the protection of atomic. Of course, one can always say 
atomic (readTVar v) to read a TVar in a trivial transaction, 
but the call to atomic cannot be omitted. 




3.2 Blocking memory transactions 

Any concurrency mechanism must provide a way for a thread to 
await an event or events caused by other threads. In lock-based 
programming, this is typically done using condition variables; mes- 
sage based systems offer a construct to wait for messages on a 
number of channels; POSIX provides select; Win32 provides 
WaitForMultipleObjects; and STM systems to date allow the 
programmer to guard the atomic block with a boolean condition 
(see Section 2.1). None of these mechanisms are composable. 

The Haskell setting led us to a remarkably simple and compos- 
able mechanism for blocking: a single STM action retry. Here is 
the code for getR: 

getR : : Resource -> Int -> STM () 
getR r i = do { v <- readTVar r 

; if (v < i) then retry 
else writeTVar r (v-i) } 

It reads the value v of the resource and, if v >= i, decreases it by 
i. But if not, so there is insufficient resource in the variable, it calls 
retry. Conceptually, retry aborts the transaction with no effect, 
and restarts it at the beginning. However, there is no point in ac- 
tually re-executing the transaction until at least one of the TVars 
read during the attempted transaction is written by another thread. 
Furthermore, the transaction log (which is needed anyway) already 
records exactly which TVars were read. The implementation there- 
fore blocks the thread until at least one of these is updated. Notice 
that retry's type (STM a) allows it to be used wherever an STM 
action may occur. 

Unlike the validation check, which is automatic and implicit, 
retry is called explicitly by the programmer. It does not indicate 
anything bad or unexpected; rather, it shows up when some kind of 
blocking would take place in other approaches to concurrency. 

Notice that there is no need for the putR operation to remember 
to signal any condition variables. Simply by writing to the TVars 
involved, the producer will wake up the consumer. A whole class 
of lost-wake-up bugs is eliminated thereby. 

From an efficiency point of view, it makes sense to call retry 
as early as possible, and to refrain from reading unrelated locations 
until after the test succeeds. Nevertheless, the programming inter- 
face is delightfully simple, and easy to reason about. 

3.3 Sequential composition 

By using atomic, the programmer identifies atomic transactions, 
in the classic sense that the entire set of operations that it contains 
appears to take place indivisibly. This is the key to sequential 
composition for concurrency abstractions. For example, to grab 
three units of one resource and seven of another, a thread can say 

atomic (do { getR rl 3; getR r2 7 }) 

The standard do { . . ; . . } notation combines the STM ac- 
tions from the two getR calls and the underlying transactional 
memory commits their updates as a single atomic I/O action. 

The retry function is central to making transactions compos- 
able when they may block. The transaction above will block if ei- 
ther rl or r2 has insufficient resource: there is no need for the caller 
to know how getR is implemented, or what condition guarantees 
its success. Nor is there any risk of deadlock by awaiting r2 while 
holding rl. 

This ability to compose STM actions is why we did not define 
getR as an I/O action, wrapped in a call to atomic. By leaving it 
as an STM action, we allow the programmer to compose it with 
other STM actions before finally sealing it into a transaction with 
atomic. In a lock-based setting, one would worry about crucial 
locks being released between the two calls, and about deadlock 
if another thread grabbed the resources in the opposite order, but 


there are no such concerns here. Any STM action can be robustly 
composed with other STM actions. 

3.4 Composing alternatives 

We have discussed composing transactions in sequence , so that 
both are executed. STM Haskell also lets us to compose transac- 
tions as alternatives, so that only one is executed. For example, to 
get either 3 units from rl or 7 units front r2: 

atomic (getR rl 3 ‘orElse' getR r2 7) 

The orElse function is provided by the STM module (Figure 1); 
here, it is written infix, by enclosing it in backquotes, but it is a 
perfectly ordinary function of two arguments. 

The transaction s 1 ‘ orElse ‘ s2 first runs s 1 ; if it retries, then 
si is abandoned with no effect, and s2 is ran. If s2 retries as well, 
the entire call retries — but it waits on the variables read by either 
of the two nested transactions. Again, the programmer need know 
nothing about the enabling condition of si and s2. 

Using orElse provides an elegant way for library implementors 
to defer to their caller the question of whether or not to block. 
For instance it is straightforward to convert the blocking version 
of getR into one which returns a boolean success or failure result: 

nonBlockGetR : : Resource -> Int -> STM Bool 
nonBlockGetR r i = do { getR r i ; return True } 

‘orElse' return False 

Notice that this idiom depends on the left-biased nature of orElse. 
The same kind of construction can be also used to build a blocking 
operation from one that returns a boolean result: simply invoke 
retry on receiving a False result: 

blockGetR : : Resource -> Int -> STM () 
blockGetR r i = 

do { s <- nonBlockGetR r i; 

if s then return 0 else retry } 

The orElse function obeys useful laws: it is associative, and has 
unit retry: 

Ml ‘orElse' (M2 ‘orElse' M3) 

= (Ml ‘orElse' M2) ‘orElse' M3 
retry ‘orElse' M = M 
M ‘orElse' retry = M 

Haskell aficionados will recognise that STM may thus be an in- 
stance of MonadPlus. 

3.5 Exceptions 

The STM monad supports exceptions just like the 10 monad, and 
in much the same way as (say) C#. Two new primitive functions, 
catch and throw, are required; their types are given in Figure 1. 
(As with atomic, no new language constructs are needed.) The 
question is: how should transactions and exceptions interact. For 
example, what should this transaction do? 

atomic (do { 

{ n <- readTVar v_n 
; lim <- readTVar v_lim 
; writeTVar v_n (n+1) 

; if n > lim then throw (AssertionFailed "Urk") 
else if (n == lim) then retry 
else return () 

; ...write data into buffer... } 

The programmer throws an exception if n>lim, in which case the 
. .write data. . part will clearly not take place. But what about 
the write to v_n from before the exception was thrown? 

Concurrent Haskell encourages programmers to use exceptions 
for signalling error conditions, rather than for normal control flow. 



Built-in exceptions, such as divide-by-zero, also fall into this cate- 
gory. For consistency, then, in the above program we do not want 
the programmer to have to take account of the possibility of ex- 
ceptions. , when reasoning that if v_n is (observably) written then 
data is written into the buffer. We therefore specify that exceptions 
have abort semantics: if an atomic transaction throws an exception, 
the transaction is aborted with no effect. If the programmer wants 
to commit the effects up to the point at which the exception was 
thrown, he can easily catch the exception inside the transaction, and 
return normally — the transaction is only aborted if the exception 
propagates to the end of the atomic block. 

Our use of exceptions to abort atomic blocks is a free design 
choice. In other languages, especially in ones where exceptions are 
used more frequently, it might be appropriate to distinguish be- 
tween exceptions that cause the enclosing atomic block to abort 
from exceptions that allow it to commit before they are propagated. 
Shinnar et al. show how abort semantics are valuable when han- 
dling exceptions even in single-threaded applications [28]. 

Notice the difference between calling throw and calling retry. 
The former signals an error, and aborts the transaction; the latter 
only indicates that the transaction is not yet ready to run, and causes 
it to block until the situation changes. 

An exception can carry a value out of the STM world. For 
example, consider 

atomic (do 

{ s <- readTVar svar 
; writeTVar svar "Wuggle" 

; if length s < 10 then 

throw (AssertionFailed s) 
else ... } 

Here, the external world gets to see the exception value holding 
the string s that was read out of the TVar. On the other hand, 
since the transaction is aborted, no writes to svar are externally 
observable. One might argue that it is wrong to allow even reads to 
“leak” from an aborted transaction, but we do not agree. The values 
carried by an exception can only represent a consistent view of the 
store (or validation would fail, and the transaction would retry), and 
it is almost impossible to debug an error condition that only says 
“something bad happened" while deliberately discarding all clues 
to what the bad thing was. The basic transactional guarantees are 
not threatened. 

What if the exception carries a TVar allocated in the aborted 
transaction? A dangling pointer would be unpleasant! To avoid 
this we refine the semantics of exceptions to say that a transaction 
that throws an exception is aborted so far as its write effects are 
concerned, but its allocation effects are retained; after all, they are 
thread-local. As a result, the TVar is visible after the transaction, 
in the state it had when it was allocated. Cases like these are tricky, 
which is why we provide a full formal semantics in Section 5. 

Concurrent Haskell also provides asynchronous exceptions 
which can be thrown into a thread as a signal - typical examples 
are error conditions like stack overflow, or when a master thread 
wishes to shut down a helper. If a thread is in the midst of an 
STM transaction, then the transaction log can be discarded without 
externally-visible effects. By aborting the transaction we provide a 
kill-safe mechanism for avoiding the kind of consistency problems 
that Flatt and Findler describe [5], 

4. Applications and examples 

In this section we provide some examples of how composable 
memory transactions can be used to build higher level concur- 
rency abstractions. We focus on operations that involve potentially- 
blocking communication between threads. Previous work has 
shown, many times over, how standard shared-memory data struc- 


tures can be developed from sequential code using transactional 
memory operations (for instance [10, 9]). 

4.1 MVars 

Prior to our STM work. Concurrent Haskell provided MVars as its 
primitive mechanism for allowing threads to communicate safely. 
An MVar is a mutable location like a TVar, except that it may be 
either empty, or full with a value. The takeMVar function leaves a 
full MVar empty, and blocks on an empty MVar. A put MVar on an 
empty MVar leaves it full, and blocks on a full MVar. So MVars are, 
in effect, a one-place channel. 

It is easy to implement MVars on top of TVars. An MVar holding 
a value of type a can be represented by a TVar holding a value 
of type Maybe a; this is a type that is either an empty value 
(“Nothing"), or actually holds an a (e.g. “Just 42"). 

type MVar a = TVar (Maybe a) 
newEmptyMVar : : STM (MVar a) 
newEmptyMVar = newTVar Nothing 

The takeMVar operation reads the contents of the TVar and retries 
until it sees a value other than Nothing: 

takeMVar : : MVar a -> STM a 
takeMVar mv 

= do { v <- readTVar mv 
; case v of 

Nothing -> retry 

Just val -> do { writeTVar mv Nothing 
; return val } } 

The corresponding putMVar operation retries until it sees Nothing, 
at which point it updates the underlying TVar: 

putMVar : : MVar a -> a -> STM 0 
putMVar mv val 

= do { v <- readTVar mv 
; case v of 

Nothing -> writeTVar mv (Just val) 

Just val -> retry } 

Notice how operations which return a boolean success / failure re- 
sult can be built directly from these blocking designs. For instance: 

try PutMVar : : MVar a -> a -> STM Bool 
tryPutMVar mv val 

= do { putMVar mv val ; return True } 

‘orElse' return False 

4.2 Multicast channels 

MVars effectively provide communication channels with a single 
buffered item. In this section we show how to program buffered, 
multi-item, multicast channels, in which items written to the chan- 
nel (writeMChan in the interface below) are buffered internally and 
received once by each read-port created from the channel. The full 
interface is: 

data MChan a 
data Port a 

newMChan : : STM (MChan a) 

— Write an item to the channel: 
writeMChan : : MChan a -> a -> STM () 

-- Create a new read port: 

newPort : : MChan a -> STM (Port a) 

-- Read the next buffered item: 
readPort : : Port a -> STM a 

We represent the buffered data by a linked list, or Chain, of items, 
with a transactional variable in the tail, so that it can be extended 
by writeMChan: 



type Chain a = TVar (Item a) 

data Item a = Empty I Full a (Chain a) 

An MChan is represented by a mutable pointer to the “write” end of 
the chain, while a Port points to the read end: 

type MChan a = TVar (Chain a) 
type Port a = TVar (Chain a) 

With these definitions, the code writes itself: 

newMChan = do { c <- newTVar Empty; newTVar c } 
newPort me = do { c <- readTVar me; newTVar c } 

readPort p 

= do { c <- readTVar p 
; i <- readTVar c 
; case i of 

Empty -> retry 

Full v c' -> do { writeTVar p c'; 

return v } } 

writeMChan me v 

= do { c <- readTVar me 

; c’ <- newTVar Empty 
; writeTVar c (Full v c’) 

; writeTVar me c' } 

Notice the use of retry to block readPort when the buffer is 
empty. Although this implementation is very simple, it ensures 
that each item written into the MChan is delivered to every Port; 
it allows multiple writers (their writes are interleaved); it allows 
multiple readers on each port (data read by one is not seen by 
the other readers on that port); and when a port is discarded, the 
garbage collector recovers the buffered data. 

More complicated variants are simple to program. For example, 
suppose we wanted to ensure that the writer could get no more than 
N items ahead of the most advanced reader. One way to do this 
would be for the writer to include a serially-increasing Int in each 
Item, and have a shared TVar holding the maximum serial number 
read so far by any reader. It is simple for the readers to keep this up 
to date, and for the writer to consult it before adding another item. 

4.3 Merge 

We have already stressed that transactions are composable. For 
example, to read from either of two different multicast channels 
we can say: 

atomic (readPort pi ‘orElse* readPort p2) 

No changes need to be made to either multicast channel. If neither 
port has any data, the STM machinery will cause the thread to wait 
simultaneously on the TVars at the extremity of each channel. 

Equally, the programmer can wait on a condition which involves 
a mixture of MVars and channels (perhaps the multicast channel 
indicates ordinary data and an MVar is being used to signal a 
termination request), for instance: 

atomic (readPort pi ‘orElse* takeMVar ml) 

This example is contrived for brevity, but it shows how operations 
taken from different libraries, implemented without anticipation of 
them being used together, can be composed. In the most general 
case we can select between values received from a number of 
different sources. Given a list of computations of type STM a we 
can take the first value to be produced from any of them by defining 
a merge operator: 

merge : : [STM a] -> STM a 
merge = foldrl orElse 

This example is childishly simple in STM Haskell. In contrast, a 
function of type 
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Figure 2: The syntax of values and terms 


mergelO : : [10 a] -> 10 a 

is un-implementable in Concurrent Haskell, or indeed in other 
settings with operations built from mutual exclusion locks and 
condition variables. 

4.4 Summary 

Our main claim is that transactional memory qualitatively raises 
the level of abstraction offered to programmers. Just as high-level 
languages free programmers from worrying about register alloca- 
tion, so transactional memory frees the programmer from concerns 
about locks and lock acquisition order. More fundamentally, one 
can combine abstractions without knowing their implementations, 
a property that is the key to constructing large programs. 

Like high-level languages, transactional memory does not ban- 
ish bugs altogether; for example, two threads can easily deadlock 
if each awaits some communication from the other. But, again 
like high-level languages, the gain is very substantial: transactions 
provide a programming platform for concurrency that eliminates 
whole classes of concurrency errors, and allows the programmer to 
concentrate on the really interesting bits. 

5. The semantics of STM Haskell 

So far our description of the functions in Figure 1 has been infor- 
mal. It is hard to be sure that such descriptions cover all the com- 
binations of these functions that might arise, so in this section we 
provide a formal, operational semantics for STM Haskell. 

Figure 4 gives a small-step operational semantics for a small 
language whose syntax is given in Figure 2. The key idea is that 
there are two transition relations: the top-level I/O transitions , writ- 
ten “ — ► ”; and the STM transitions, written The I/O transition 

relation takes a program state P; 0 to a new program state Q\ 0', 
while performing input/output described by an action a: 

P;0 A Q;0' 

Execution proceeds by repeatedly choosing a thread, and execut- 
ing a single I/O transition; transitions from different threads may 
thereby be interleaved in a non-deterministic way. An atomic 
block, however, invokes zero or more steps of the STM transition 
relation, but the result state change is regarded as a single I/O tran- 
sition ; transitions in the STM relation therefore cannot interleave. 
The semantics has no notion of transaction logs or rollback - these 
are implementation matters. Instead the semantics expresses atom- 
icity simply by requiring that an atomic block, if chosen for the 
next I/O transition, must reduce (using =>) to a return or throw, 
and not to retry. The rest of this section fleshes out the details. 
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Figure 3: The program state and evaluation contexts 


5.1 Syntax 

Figure 2 gives the syntax of a fragment of STM Haskell. Terms and 
values are entirely conventional, except that we treat the application 
of monadic combinators, such as return and catch, as values. 
The do-notation we have been using so far is syntactic sugar for 
uses of return and >>=: 

do {x<~e; Q} = e »= (\x-> do {Q}) 

do {e; Q} = e »= (\_-> do {Q}) 

do {e> = e 

The monadic operations return, >>=, throw, and catch are over- 
loaded, and can be used in both the 10 and STM monad. Specific to 
the 10 monad are: 

getChar : : 10 Char 
putChar : : Char -> 10 () 
f orklO : : 10 a -> 10 Threadld 

I/O transitions are labelled with an optional action a, describing 
the input/output effect of the transition. The actions a (Figure 2) 
allow reading a character c from standard input ?c, writing one to 
standard output ! c, or the silent action e, which is often omitted. A 
real system would have many more input/output actions. 

A program state P; 0 consists of a thread soup P and a heap 
0 (Figure 3). A thread soup is just a multi-set of threads, each 
consisting of a single term M annotated with a thread ID t . A heap, 
0, is a finite mapping from references to terms. 

To describe the possible transitions of a program state, we use an 
evaluation context to identify the active site for the transition. Fig- 
ure 3 gives the syntax of evaluation contexts. A program evaluation 
context, P, corresponds to the scheduler of a real implementation. 
It chooses an arbitrary thread from the soup, and then uses the term 
evaluation context E to find the active site in the term. The term 
evaluation context corresponds to the stack of a real machine, and 
looks into the left operand of »=, catch, and orElse. 

5.2 Operational semantics 

Now we are ready to discuss the transition rules of Figure 4. First 
we treat the I/O transitions, in the top part of the figure, which 
can have arbitrary input/output effects. The first two rules deal 
with input and output. If the active term is a putChar or getChar 
the appropriate labelled transition takes place, and the operation is 
replaced by a return carrying the result. Rule (FORK) allows a 
new thread to be created, by adding a new term M to the thread 
soup, allocating a fresh name t as its Threadld. 

Rule (ADMIN) concerns administrative transitions, which are 
given in the second section of Figure 4. Rule (EVAL) allows a 
term M that is not a value to be evaluated by an auxiliary func- 
tion, V[MJ, which gives the value of M. This function is entirely 
standard, and we omit it here. Rule (BIND) implements sequential 
composition in the monad. The rules (THROW), (CATCH1) and 
(CATCH2) implement exceptions in the standard way. All of these 
rules are, as we shall see, used in both the 10 monad and the STM 
monad, which is why we keep them in a separate group. 


Everything so far is quite standard. The new part starts with 
rules (ARET) and (ATHROW). The former describes how an 
atomic transaction takes place: the term M makes zero or more 1 
transitions of the STM relation, =>, which takes the following form: 

M; 0, A => N ; 0',A' 

Here, 0 is the heap as before, while A redundantly records the al- 
location effects (only) of the transition, for use during exception 
handling. Rule (ARET) specifies that the term M may make zero 
or more STM transitions until it reaches the form (return N), in- 
dicating successful completion. In that case, rule (ARET) takes one 
step, embodying the new heap 0' as its resulting heap. In contrast, 
rule (ATHROW) specifies that if M evaluates to (throw N), then 
the new heap 0' is discarded, and instead just the allocation effects 
A are added to the initial heap 0. 

Rules (ATHROW) and (ARET) are the only rules in the top 
panel of Figure 4 that affect the heap, so we can see immediately 
that the heap can be mutated only inside an atomic block. Fur- 
thermore, notice that multiple STM transitions yield a single pro- 
gram transition. Program transitions from different threads can be 
interleaved, but (ARET) provides no way for STM transitions to 
interleave. This is precisely what it means to execute “atomically”. 
(A real implementation will not do this, but we are concerned with 
semantics here.) 

The STM transitions themselves, in the last part of Figure 4, 
are largely standard. In particular. Rules (READ), (WRITE), and 
(NEW) describe how new mutable locations can be read, writ- 
ten, and created; the only point of interest is that (NEW) not only 
records the location’s creation in the heap, but also in the alloca- 
tion record A, for use by (ATHROW). Rule (AADMIN) lifts the 
administrative transitions into the STM world, just as t. The inter- 
esting part is the orElse combinator and retry, which we tackle 
next. 

5.3 Blocking and nested transactions 

The alert reader may be wondering why there is no rule (ARETRY) 
to go along with (ARET) and (ATHROW), to account for the fact 
that an STM computation may evaluate to retry, for instance: 

atomic (do 

f v <- readTVar r 

; if v==0 then retry else return () 

; ...» 

What if v is zero? Then the body of the atomic block reduces to 
retry. There is no rule for this case. This means that the transition 
system cannot make progress by choosing a thread whose next 
operation is an atomic block, when the heap will cause it to retry. 
To make progress, another thread must be chosen. 

Nested transactions are handled by rules (OR1-3). The first of 
these tries the left argument of an orElse. If it succeeds normally, 
then that is the result of the orElse, including any memory effects 
in 0'. If it throws an exception, that too is the result of the orElse, 
and any memory effects are retained. But if Mi retries, then rule 
(OR3) discards all its effects, and instead commits to M 2 . Notice 
the strong similarity between (ARET) and (OR1), and between 
(ATHROW) and (OR2); this is the sense in which we say that 
orElse implements nested transactions. 

An alternative design would have (OR2) behave like (OR3); that 
is, if Mi throws an exception, we could discard its effects and try 
M 2 instead. But that would invalidate the beautiful identity which 
makes retry a unit for orElse and would also make orElse 
asymmetric in its treatment of exceptions (discarded from Mi but 
retained for M 2 ). This was not a hard choice to make! 


1 The repetition is indicated by the star. 
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Figure 4: Operational semantics of STM Haskell 


6. Implementation 

Our implementation is split into two layers. The top layer imple- 
ments the STM operations from Figure 1. This is built on top of the 
lower layer, which comprises a C library for performing memory 
transactions that is integrated in the Haskell runtime system. Fig- 
ure 5 shows the API to our C library; we consider the four groups 
of operations in turn in Sections 6. 1-6.4. 

Concurrent Haskell is currently implemented only for a uni- 
processor. The runtime schedules lightweight Haskell threads 
within a single operating system thread. Haskell threads are only 
suspended at well-defined “safe points”; they cannot be pre-empted 
at arbitrary moments. This environment simplifies the implemen- 
tation of our library because, by construction, C runtime functions 
run without interruption. 

We are confident that a multi-processor implementation is prac- 
tical: our previous work has developed several techniques for build- 
ing multi-processor STMs in which a multi-word atomic update 
is no worse than half the speed of a uniprocessor design [10, 9]. 
These have been tested in practice on 1..96-CPU shared memory 
machines, giving scalable performance when threads are attempt- 
ing non-conflicting transactions (for instance, concurrent inserts on 


different parts of a red-black tree could commit in parallel). Even in 
the intensive workload we describe in Section 6.6. the commit op- 
eration is less than 10% of total execution time and so the overall 
consequences of using a parallel version would be low. 

6.1 Transaction logs and TVar accesses 

While executing a memory transaction, a thread-local transaction 
log is built up recording the reads and tentative writes that the 
transaction has performed. This transaction log is held in a heap- 
allocated object called a TLog that is pointed to by the Thread 
Control Block of the thread engaged in the transaction. 

The log contains an entry for each of the TVars that the memory 
transaction has accessed. Each entry contains a reference to the 
TVar involved, the old value held in the TVar when it was first 
accessed in the transaction, and the new value to be stored in the 
TVar if the transaction commits. These two values are identical in 
the case a TV ar that has been read but not written by the transaction. 

Within a memory transaction, all TVar accesses are performed 
by STMReadTVar and STMWriteTVar (Figure 5). These accesses 
remain buffered within the thread's log, and hence invisible to 
other threads, until the transaction commits (Section 6.2): writes 
are made to the log, and reads first consult the log so that they see 







Figure 7: Transaction logs for two threads blocked on a TVar 


first calls STM Is Valid to check that the transaction is not already 
doomed. If it is, the stack is unrolled back to the AtomicFrame and 
the transaction is re-started. In this way, doomed transactions can 
be killed off before they have consumed too much time. It does not 
make sense to validate more frequently on a uniprocessor (indeed, 
less frequently might perform better) but, as in previous work, we 
might use an alternative scheme on a multiprocessor. 

6.3 Blocking transactions: retry 

Leaving aside the possibility of orElse for the moment, calling 
retry causes the stack to be unwound searching for the enclosing 
AtomicFrame — the types guarantee that exactly one such frame 
exists. Then STMIsValid is called, as usual, to check that the trans- 
action log has seen a consistent view of the heap, and if not the 
transaction is re-run. In the consistent case, STMWait is called. It al- 
locates new wait-queue entries , held in doubly-linked lists attached 
to the TVars that the transaction has read, using the previously-null 
field in each TVar. Once this is done, the calling thread is respon- 
sible for blocking itself and re-entering the scheduler. 

The wait queue entries are noticed by an STMCommit which 
updates the TVars: the updater unblocks any waiters it encounters. 
Once a waiting thread is rescheduled, it is responsible for calling 
STMIsValid to assess whether it should retry execution of its 
atomic block. If the transaction is no longer valid then STMUnWait 
unlinks its wait queue entries and the caller retries its transaction 
with a fresh log. If the transaction is still valid then it leaves its wait 
queue entries in place, so that it can be woken by further updates, 
and blocks once more - this can happen only if, by the time the 
thread is scheduled, the TVars again contain pointer-equal values 
to those originally read by that thread. 



(b) If both branches retry then the two logs are merged into the 
enclosing transaction and the retry propagates. 


Figure 8: Two steps in the implementation of ‘ orElse ‘ 


cannot tell without further experience. A transaction may also never 
commit if it is waiting for a condition that never becomes true. 

6.6 Performance 

Evaluation of the STM implementation described here is at an early 
stage, so there are no detailed performance results to report as yet. 

However, initial measurements are encouraging. We wrote a 
simple implemention of unbounded channels in STM Haskell, 
which mirrors the channel abstraction of Concurrent Haskell [22] 
implemented using MVars. We benchmarked the two implementa- 
tions by measuring the time taken to communicate a large number 
of values over a channel between two threads. They performed al- 
most identically: runtimes were the same (to within 10%), and the 
STM version allocated 50% less heap space during the run. 

Why should this be the case, given that the STM version appears 
to be doing more bookkeeping under the hood? The raw MVar op- 
erations would outperform the equivalent TVar operations if we 
benchmarked them independently, but in practice programs don’t 
perform raw MVar operations. Instead, the MVar operation is nor- 
mally wrapped in an exception handler that restores invariants in 
the event of an exception. Further protection from asynchronous 
exceptions is usually required, to prevent an asynchronous excep- 
tion from arriving before the handler has been installed [17], This 
exception-robustness is implemented in the MVar -based channel li- 
brary that we used, but it adds significant overhead to MVars. 

In contrast, our STM code benefits from asynchronous excep- 
tion safety “for free”, because each channel operation is atomic. In 
short, the STM-based channels are not only clearer, but the opera- 
tions are composable, and it runs just as fast as the MVar version. 


7. Related work 

We build on two main categories of related work. The first, dis- 
cussed in Section 7.1, is work on transactional models of concur- 
rency and the design and implementation of STMs. The second, in 
Sections 7.2-7. 3 are the designs that have been attempted to pro- 
vide forms of composability in concurrent programming languages. 




7.1 Transactions 

Transactions have long been used for fault-tolerance in databases [7] 
and distributed systems. These transactions rely on stable storage 
and distributed commit protocols to protect system integrity against 
crashes and communication failures. 

Nested transactions were first proposed by Moss [19], who ex- 
tended nesting to two-phase locking protocols. The Argus lan- 
guage [16] for fault-tolerant distributed applications provided ex- 
plicit language support for nested transactions. 

Distributed transactions typically provide both synchronisation, 
ensuring that concurrently-executing transactions appear to execute 
serially, and persistence, ensuring that state changes are backed 
up on fault-tolerant, non-volatile storage. Recently, several projects 
have provided persistence without synchronisation for transactions 
running at a single machine [15, 26, 13]. 

By contrast, software transactional memory provides synchroni- 
sation without persistence. Because the state manipulated by mem- 
ory transactions is not intended to survive crashes or communi- 
cation failures, there is no need for distributed commit protocols 
or stable storage. It follows that many design and implementa- 
tion issues are quite different from those arising in distributed or 
persistence-only transaction systems. 

Transactional memory was originally proposed as a hardware 
architecture [11, 29] to support non-blocking synchronisation, and 
architectural support for this model remains the subject of ongoing 
research [18, 20, 24, 8]. A number of proposals have emerged for 
supporting transactional memory in software [12, 27, 4, 10, 9]. 

Work on software transactional memory has focused on li- 
braries, not on integrating transactional mechanisms into a pro- 
gramming language. Two exceptions are Welc etal. [31] who show 
how STM-like techniques can increase the concurrency available 
in systems based on Java’s synchronized blocks, and Harris and 
Fraser [9] who discuss how Java might be adapted to support non- 
blocking atomic sections. In recent work Welc et al. showed how 
I/O could be performed by backing off from an optimistic execu- 
tion scheme to a pessimistic one - however, their approach relied 
on starting with a correctly-synchronized lock-based program [30]. 

Prior work has not placed much emphasis on mechanisms for 
conditional blocking or compositionality. Herlihy et al. [10] sup- 
port syntactically nested transactions by “flattening” nested trans- 
actions to a single transaction, but provide no explicit mecha- 
nism for conditional blocking. Harris and Fraser [9] support con- 
ditional blocking using a guarded-command syntax, but lacking 
retry, such transactions could not be easily composed. Lastly, no 
prior work on memory transactions supports the equivalent of the 
orElse construct, which is essential for composition. 

7.2 Concurrent ML 

Concurrent ML [25] is an inspiring language directed squarely at 
the goal of composable concurrency. The principal abstraction is 
that of a first-class event, which allow far richer composition than 
do conventional locks, or Concurrent Haskell’s MVars. One can 
draw an analogy between a CML event and an STM action in our 
language. Events can be composed as alternatives using choose, 
which is similar to our orElse, and “run” using sync, which has 
the same flavour as our atomic; in Haskell syntax their types are: 

sync : : Event a -> a 

choose : : [Event a] -> Event a 

However, nothing corresponds to our notion of sequential compo- 
sition of actions. Indeed, given an Event a and an Event b, one 
cannot construct a compound event of type Event (a,b) that fires 
only when both argument events fire. This is no accident — CML 
events are carefully structured to have a single “commit point” — 
but it limits the way in which events can be composed. 


This same limitation does support one form of abstraction that 
we cannot. A swap channel offers the operation 

swap : : SwapChan a -> a -> Event a 

The idea is that two threads rendezvous at a SwapChan, and ex- 
change data. But no matter how many threads are simultaneously 
calling swap on the same channel, if thread A gives data to thread 
B, then B’s data must go to A. We cannot support a composable 
swap inside an STM transaction because that would require mutual 
linkage of an arbitrary number of threads whereas STM actions rep- 
resent isolated updates made by individual threads. Suppose thread 
A does a swap with thread B; and then both go on to swap with 
third parties (Al and Bl, say). Then if Al is not ready, A’s trans- 
action must retry; and hence so must B’s, and so must Bl’s, and so 
on. In contrast, it is easy to define swap-channels with the operation 

swap : : SwapChan a -> a -> 10 a 

but this operation, having an 10 type, does not compose (by de- 
sign). It is perhaps interesting to note for future work that this kind 
of synchronization, which is hard to build with STM, is extremely 
easy to build with a chord in Benton et al.’s Polyphonic C# [1]. 

7.3 Scheme48 proposals 

Scheme 48 proposals are an optimistic-concurrency mechanism 
that supports a subset of our notion of memory transactions [14]. 
Each thread maintains a log which records the reads and writes per- 
formed using the operations provisional-car, provisional- 
set-car!, etc. The call call-ensuring-atomicity t is just 
like our atomic t; it re-runs automatically if t sees an inconsistent 
view of memory. 

Of course. Scheme is untyped, so the proposal mechanism can- 
not offer any guarantees about effects; for example, there is no way 
to ensure that the programmer only uses provisional-car etc 
inside a transaction, nor that transactions refrain from doing in- 
put/output. There is no mechanism for conditionally entering a pro- 
posal (and blocking if the condition does not hold), let alone for our 
modular retry. The programmer must resort to locks and condi- 
tion variables for that. Nor is there anything like orElse. 

8. Conclusion 

We have shown that STM provides a substrate for concurrent pro- 
gramming that offers far richer composition than has been available 
to date, and that it can be implemented in a practical language. 

We have used Haskell as a particularly-suitable laboratory, but 
an obvious question is this: to what extent can our results be carried 
back into the mainstream world of imperative programming? We 
believe that the idea of using constructs like retry and orElse 
can indeed be applied to other languages. For instance, in C#, one 
could indicate retry by raising a specified kind of exception and 
then express orElse as a particular kind of exception handler. 

An interesting distinction to notice about atomic blocks in C# 
or Java, when compared with Haskell, is that it would be necessary 
to support dynamic nesting. The reason is that, in Haskell, the code 
within an atomic block has an STM type and so the only way it 
can be run is by atomic execution: library operations do not need 
to ensure atomicity internally because it will be provided by their 
callers. In contrast, in a traditional imperative language, atomicity 
would be the responsibility of the callee rather than the caller and 
so it may be provided defensively at multiple levels in a call chain. 

In an imperative setting it is less clear how to statically prevent 
operations with irreversible side effects being performed within 
transactions: there is not ordinarily any way of indicating possible 
effects other than (in some languages) the sets of exceptions that a 
method may raise. Whether or not one believes in transactions, it 
does seem likely that some combination of effect systems and/or 



ownership types [2] will play an increasingly important role in 
concurrent programming languages, and these may contribute to 
the guarantees desirable for memory transactions. 

Our implementation forms part of GHC 6.4. which is publicly 
available at http://haskell.org/ghc. Our current implemen- 
tation is for uni-processor, but we plan to work on a true multi- 
processor implementation in 2005. 
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